Stateful subnet manager failover in a middleware machine environment

ABSTRACT

A system and method can provide stateful subnet manager failover in a middleware machine environment. The system includes a policy daemon associated with each master subnet manager candidate in a subnet in the middleware machine environment. The policy daemon manages one or more policies for the subnet. The system also includes a transactional interface associated with the policy daemon co-located with a current master subnet manager. The transactional interface allows for updating the one or more policies using a policy update transaction. The policy daemon co-located with the master subnet manager operates to replicate the policy update transaction to one or more policy daemons co-located with the subnet managers that are master candidates associated with the master subnet manager, before committing the policy update transaction. Additionally, when the master subnet manager fails, the subnet managers operate to negotiate with each other and elect a new master subnet manager.

CLAIM OF PRIORITY

This application claims the benefit of priority on U.S. ProvisionalPatent Application No. 61/384,228, entitled “SYSTEM FOR USE WITH AMIDDLEWARE MACHINE PLATFORM” filed Sep. 17, 2010; U.S. ProvisionalPatent Application No. 61/484,390, entitled “SYSTEM FOR USE WITH AMIDDLEWARE MACHINE PLATFORM” filed May 10, 2011; U.S. Provisional PatentApplication No. 61/493,330, entitled “STATEFUL SUBNET MANAGER FAILOVERIN A MIDDLEWARE MACHINE ENVIRONMENT” filed Jun. 3, 2011; U.S.Provisional Patent Application No. 61/493,347, entitled “PERFORMINGPARTIAL SUBNET INITIALIZATION IN A MIDDLEWARE MACHINE ENVIRONMENT” filedJun. 3, 2011; U.S. Provisional Patent Application No. 61/498,329,entitled “SYSTEM AND METHOD FOR SUPPORTING A MIDDLEWARE MACHINEENVIRONMENT” filed Jun. 17, 2011, each of which applications are hereinincorporated by reference.

COPYRIGHT NOTICE

A portion of the disclosure of this patent document contains materialwhich is subject to copyright protection. The copyright owner has noobjection to the facsimile reproduction by anyone of the patent documentor the patent disclosure, as it appears in the Patent and TrademarkOffice patent file or records, but otherwise reserves all copyrightrights whatsoever.

FIELD OF INVENTION

The present invention is generally related to computer systems andsoftware such as middleware, and is particularly related to supporting amiddleware machine environment.

BACKGROUND

Infiniband (IB) Architecture is a communications and managementinfrastructure that supports both I/O and interprocessor communicationsfor one or more computer systems. An IB Architecture system can scalefrom a small server with a few processors and a few I/O devices to amassively parallel installation with hundreds of processors andthousands of I/O devices.

The IB Architecture defines a switched communications fabric allowingmany devices to concurrently communicate with high bandwidth and lowlatency in a protected, remotely managed environment. An end node cancommunicate with over multiple IB Architecture ports and can utilizemultiple paths through the IB Architecture fabric. A multiplicity of IBArchitecture ports and paths through the network are provided for bothfault tolerance and increased data transfer bandwidth.

These are the generally areas that embodiments of the invention areintended to address.

SUMMARY

Described herein is a system and method that can provide stateful subnetmanager failover in a middleware machine environment. In accordance withan embodiment, the system includes a policy daemon associated with eachmaster subnet manager candidate in a subnet in the middleware machineenvironment. The policy daemon manages one or more policies for thesubnet. The system also includes a transactional interface associatedwith the policy daemon that is co-located with a current master subnetmanager. The transactional interface allows for updating the one or morepolicies using a policy update transaction. The policy daemon co-locatedwith the master subnet manager operates to replicate the policy updatetransaction to one or more policy daemons co-located with the subnetmanagers that are master candidates associated with the master subnetmanager, before committing the policy update transaction. Additionally,when the master subnet manager fails, the one or more subnet manageroperate to negotiate with each other and elect a new master subnetmanager.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 shows an illustration of an exemplary configuration for amiddleware machine, in accordance with an embodiment of the invention.

FIG. 2 shows an illustration of a middleware machine environment, inaccordance with an embodiment of the invention.

FIG. 3 shows an illustration of a middleware machine environment thatsupports a policy transaction, in accordance with an embodiment of theinvention.

FIG. 4 illustrates an exemplary flow chart for supporting a policytransaction in a middleware machine environment, in accordance with anembodiment of the invention.

FIG. 5 shows an illustration of stateful subnet manager failoverscenario in a middleware machine environment, in accordance with anembodiment of the invention.

FIG. 6 illustrates an exemplary flow chart for implementing a systemthat supports stateful subnet manager failover in a middleware machineenvironment, in accordance with an embodiment of the invention.

DETAILED DESCRIPTION

Described herein is a system and method for providing a middlewaremachine or similar platform. In accordance with an embodiment of theinvention, the system comprises a combination of high performancehardware (e.g. 64-bit processor technology, high performance largememory, and redundant InfiniBand and Ethernet networking) together withan application server or middleware environment, such as WebLogic Suite,to provide a complete Java EE application server complex which includesa massively parallel in-memory grid, that can be provisioned quickly,and that can scale on demand. In accordance with an embodiment of theinvention, the system can be deployed as a full, half, or quarter rack,or other configuration, that provides an application server grid,storage area network, and InfiniBand (IB) network. The middlewaremachine software can provide application server, middleware and otherfunctionality such as, for example, WebLogic Server, JRockit or HotspotJVM, Oracle Linux or Solaris, and Oracle VM. In accordance with anembodiment of the invention, the system can include a plurality ofcompute nodes, one or more IB switch gateways, and storage nodes orunits, communicating with one another via an IB network. Whenimplemented as a rack configuration, unused portions of the rack can beleft empty or occupied by fillers.

In accordance with an embodiment of the invention, referred to herein as“Sun Oracle Exalogic” or “Exalogic”, the system is an easy-to-deploysolution for hosting middleware or application server software, such asthe Oracle Middleware SW suite, or Weblogic. As described herein, inaccordance with an embodiment the system is a “grid in a box” thatcomprises one or more servers, storage units, an IB fabric for storagenetworking, and all the other components required to host a middlewareapplication. Significant performance can be delivered for all types ofmiddleware applications by leveraging a massively parallel gridarchitecture using, e.g. Real Application Clusters and Exalogic Openstorage. The system delivers improved performance with linear I/Oscalability, is simple to use and manage, and delivers mission-criticalavailability and reliability.

FIG. 1 shows an illustration of an exemplary configuration for amiddleware machine, in accordance with an embodiment of the invention.As shown in FIG. 1, the middleware machine 100 uses a single rackconfiguration that includes two gateway network switches, or leafnetwork switches, 102 and 103 that connect to twenty-eight server nodes.Additionally, there can be different configurations for the middlewaremachine. For example, there can be a half rack configuration thatcontains a portion of the server nodes, and there can also be amulti-rack configuration that contains a large number of servers.

As shown in FIG. 1, the server nodes can connect to the ports providedby the gateway network switches. As shown in FIG. 1, each server machinecan have connections to the two gateway network switches 102 and 103separately. For example, the gateway network switch 102 connects to theport 1 of the servers 1-14 106 and the port 2 of the servers 15-28 107,and the gateway network switch 103 connects to the port 2 of the servers1-14 108 and the port 1 of the servers 15-28 109.

In accordance with an embodiment of the invention, each gateway networkswitch can have multiple internal ports that are used to connect withdifferent servers, and the gateway network switch can also have externalports that are used to connect with an external network, such as anexisting data center service network.

In accordance with an embodiment of the invention, the middlewaremachine can include a separate storage system 110 that connects to theservers through the gateway network switches. Additionally, themiddleware machine can include a spine network switch 101 that connectsto the two gateway network switches 102 and 103. As shown in FIG. 1,there can be optionally two links from the storage system to the spinenetwork switch.

IB Fabric/Subnet

In accordance with an embodiment of the invention, an IB Fabric/Subnetin a middleware machine environment can contain a large number ofphysical hosts or servers, switch instances and gateway instances thatare interconnected in a fat-tree topology.

FIG. 2 shows an illustration of a middleware machine environment, inaccordance with an embodiment of the invention. As shown in FIG. 2, themiddleware machine environment 200 includes an IB subnet or fabric 220that connects with a plurality of end nodes. The IB subnet includes aplurality of subnet managers 211-214, each of which resides on one of aplurality of network switches 201-204. The subnet managers cancommunicate with each other using an in-band communication protocol 210,such as the Management Datagram (MAD)/Subnet Management Packet (SMP)based protocols or other protocol such as the Internet Protocol over IB(IPolB).

In accordance with an embodiment of the invention, a single IP subnetcan be constructed on the IB fabric allowing the switches to communicatesecurely among each other in the same IB fabric (i.e. full connectivityamong all switches). The fabric based IP subnet can provide connectivitybetween any pair of switches when at least one route with operationallinks exists between the two switches. Recovery from link failures canbe achieved if an alternative route exists by re-routing.

The management Ethernet interfaces of the switches can be connected to asingle network providing IP level connectivity between all the switches.Each switch can be identified by two main IP addresses: one for theexternal management Ethernet and one for the fabric based IP subnet.Each switch can monitor connectivity to all other switches using both IPaddresses, and can use either operational address for communication.Additionally, each switch can have a point-to-point IP link to eachdirectly connected switch on the fabric. Hence, there can be at leastone additional IP address.

IP routing setups allow a network switch to route traffic to anotherswitch via an intermediate switch using a combination of the fabric IPsubnet, the external management Ethernet network, and one or more fabriclevel point-to-point IP links between pairs of switches. IP routingallows external management access to a network switch to be routed viaan external Ethernet port on the network switch, as well as through adedicated routing service on the fabric.

The IB fabric includes multiple network switches with managment Ethernetaccess to a managment network. There is in-band physical connectivitybetween the switches in the fabric. In one example, there is at leastone in-band route of one or more hops between each pair of switches,when the IB fabric is not degraded. Management nodes for the IB fabricinclude network switches and management hosts that are connected to theIB fabric.

A subnet manager can be accessed via any of its private IP addresses.The subnet manager can also be accessible via a floating IP address thatis configured for the master subnet manager when the subnet managertakes on the role as a master subnet manager, and the subnet manager isun-configured when it is explicitly released from the role. A master IPaddress can be defined for both the external management network as wellas for the fabric based management IP network. No special master IPaddress needs to be defined for point-to-point IP links.

In accordance with an embodiment of the invention, each physical hostcan be virtualized using virtual machine based guests. There can bemultiple guests existing concurrently per physical host, for example oneguest per CPU core. Additionally, each physical host can have at leastone dual-ported Host Channel Adapter (HCA), which can be virtualized andshared among guests, so that the fabric view of a virtualized HCA is asingle dual-ported HCA just like a non-virtualized/shared HCA.

The IB fabric can be divided into a dynamic set of resource domainsimplemented by IB partitions. Each physical host and each gatewayinstance in an IB fabric can be a member of multiple partitions. Also,multiple guests on the same or different physical hosts can be membersof the same or different partitions. The number of the IB partitions foran IB fabric may be limited by the P_Key table size.

In accordance with an embodiment of the invention, a guest may open aset of virtual network interface cards (vNICs) on two or more gatewayinstances that are accessed directly from a vNIC driver in the guest.The guest can migrate between physical hosts while either retaining orhaving updated vNIC associates.

In accordance with an embodiment of the invention, switchs can start upin any order and can dynamically select a master subnet manageraccording to different negotiation protocols, for example an IBspecified negotiation protocol. If no partitioning policy is specified,a default partitioning enabled policy can be used. Additionally, themanagement node partition and the fabric based management IP subnet canbe established independently of any additional policy infomation andindependently of whether the complete fabric policy is known by themaster subnet manager. In order to allow fabric level configurationpolicy information to be synchronized using the fabric based IP subnet,the subnet manager can start up initially using the default partitionpolicy. When fabric level synchronization has been achieved, thepartition configuration, which is current for the fabric, can beinstalled by the master subnet manager.

Policy Transaction in a Middleware Machine Environment

In accordance with an embodiment of the invention, a system and methodcan support a policy transaction in a middleware machine environment.The system includes a policy daemon associated with a master subnetmanager in an IB subnet in the middleware machine environment. Thepolicy daemon manages one or more policies for the IB subnet. The systemalso includes a transactional interface associated with the policydaemon. The transactional interface allows for updating the one or morepolicies using a policy update transaction. Additionally, the mastersubnet manager is associated with one or more subnet manager that aremaster candidates in the middleware machine environment. The policydaemon associated with the master subnet manager operates to replicatethe policy update transaction to the one or more subnet manager beforecommitting the policy update transaction.

FIG. 3 shows an illustration of a middleware machine environment thatsupports a policy transaction, in accordance with an embodiment of theinvention. As shown in FIG. 3, the middleware machine environment 300includes an IB subnet or fabric 320 that manages a plurality of endnodes. The IB subnet includes a plurality of subnet managers 321-324,each of which resides on one of a plurality of network switches 301-304.The subnet managers can communicate with each other using an in-bandcommunication protocol 310, such as the Internet Protocol overInfiniband (IPolB). The subnet managers can negotiate among each otherand elect a master subnet manager A 321, which is responsible forconfiguring and managing the middleware machine environment.Additionally, the subnet managers B-D are standby master candidates inthe middleware machine environment, each of which is ready to take overthe master subnet manager when necessary.

In accordance with an embodiment of the invention, each network switchcan connect with one or more end nodes, such as the host servers withinthe middleware machine environment. Both the network switch and thesubnet managers residing on top of the network switch can be consideredas management nodes from the perspective of a network high availabilitymanagement model. The network switch can be either a leaf switch thatcommunicates directly with the end nodes, or a spine switch thatcommunicates with the end nodes through the leaf switches. The networkswitches can communicate with the host servers via the switch ports ofthe network switches and the host ports of the host servers. In an IBnetwork, partitions can be defined to specify which end ports are ableto communicate with other end ports.

In accordance with an embodiment of the invention, the middlewaremachine environment employs a fat-tree topology, which allows a smallnumber of switches sitting at the top layers of the fat tree whilemaintaining a large number of end nodes as leafs of the tree.

In accordance with an embodiment of the invention, the system canprovide a plurality of policy daemons 311-314, each of which isassociated with a subnet manager. The policy daemon that collocates withthe master subnet manager is responsible for configuring and managingthe end nodes in the middleware machine environment using one or morepolicies. One exemplary policy managed by a policy daemon in amiddleware machine environment can be a partition configuration policy.The partition configuration policy can be supplied to the subnet throughan initialization policy transaction.

For example, a middleware machine environment that includes end nodes,A, B and C can be partitioned into two groups: a Group I that includesnodes A and B and a Group II that includes node C. A partitionconfiguration policy can define a partition update that requiresdeleting node B from the Group I, before adding node B into the GroupII. This partition configuration policy can require that the mastersubnet manager will not allow a new partition to add node B into GroupII without first deleting nodes B from Group I. This partitionconfiguration policy can be enforced by the master subnet manager usinga policy daemon.

In accordance with an embodiment of the invention, the system canprovide a transactional interface 308 that is associated with the policydaemon. The transactional interface allows for updating the one or morepolicies managed by the policy daemon using a policy update transaction309. The policy daemon associated with the master subnet manager canreplicate the policy update transaction to the subnet manager mastercandidates before committing the policy update transaction.Additionally, the system provides a command interface that isresponsible for providing policies to the master subnet manager.

By replicating the policy updates from the master subnet manager to thesubnet manager master candidates, the system can ensure that thepolicies are synchronized within the middleware machine environment.When the standby subnet manager takes over and becomes the new mastersubnet manager, the functioning of the middleware machine environmentcan be uninterrupted and the communication in the middleware machineenvironment can maintain undisturbed. Additionally, the system canremove all stale policy information before applying the new policy orthe policy updates, in order to prevent inconsistency between the mastersubnet manager and different instances of the subnet manager mastercandidates.

In accordance with an embodiment of the invention, a policy updatetransaction can include either a new policy or a set of policy updates.Each policy update transaction can be represented using a unique versionnumber. A master subnet manager can consider a policy associated withthe highest version number, in its knowledge, as the current policy tobe used in the middleware machine environment. In one embodiment, thesystem is configured so that there is only one policy update transactionin progress at any point of time in the subnet.

In accordance with an embodiment of the invention, if the replication ofthe policy updates from the master subnet manager to the standby subnetmanagers fails, or alternatively the master subnet manager fails, thenthe policy update transaction is not committed, and the system does notapply the policy updates on the middleware machine environment in orderto preserve consistency in policy configuration. Furthermore, the systemallows a user or an administrator to intervene and manually set up theenvironment.

In accordance with an embodiment of the invention, each policy daemoncan have a close relationship with the subnet manager instance on thecorresponding node. The subnet manager can perform a synchronizationoperation with the local policy daemon whenever it becomes a mastersubnet manager. This ensures that the policy daemon can prepare allrequired current policy information that the subnet manager needs inorder to initialize and maintain the state of the subnet. For example,such information includes the current partitioning configuration whichcan be provided as a local file.

Additionally, the various policy daemon instances can cooperate toreplicate the policy information that is supposed to be shared among thesubnet managers. For example, the local synchronization with the localsubnet manager instance can ensure that the replication of a new versionof a configuration file is complete and accurate before the mastersubnet manager start to apply the new policy.

In accordance with an embodiment of the invention, when a standby subnetmanager reboots, it can synchronize with the current master or othercurrently available master candidates for the current fabric policy.

FIG. 4 illustrates an exemplary flow chart for supporting a policytransaction in a middleware machine environment, in accordance with anembodiment of the invention. As shown in FIG. 4, at step 401, a policydaemon can be associated with a master subnet manager in a subnet in themiddleware machine environment, wherein the policy daemon manages one ormore policies for the subnet. Furthermore, a transactional interface canbe associated with the policy daemon at step 402. The transactionalinterface allows for updating the one or more policies managed by thepolicy daemon associated with the master subnet manager using a policyupdate transaction. Then, at step 403, the policy daemon co-located withthe master subnet manager can replicate the policy update transaction toone or more policy daemons co-located with the one or more subnetmanagers that are master candidates before committing the policy updatetransaction.

Stateful Subnet Manager Failover

FIG. 5 shows an illustration of stateful subnet manager failoverscenario in a middleware machine environment, in accordance with anembodiment of the invention. As shown in FIG. 5, the middleware machineenvironment 500 includes a plurality of network switches 501-504together with a plurality of subnet managers 521-524 that manage aplurality of end nodes. The plurality of subnet managers can communicatewith each other using an in-band communication protocol 510, such as theInternet Protocol over Infiniband (IPolB).

In the example as shown in FIG. 5, when an old master subnet manager A521 fails, the rest of the subnet managers B-D can negotiate with eachother and elect a new master subnet manager C, which is responsible forconfiguring and managing the subnet in the middleware machineenvironment. The new master subnet manager C can determine the mostrecent versions of the fabric configuration policy information alongwith all available subnet managers B-D. Additionally, a transactioninterface 308 associated with the new master subnet manager C is used bythe system to support a new policy update transaction 509.

In order to determine the current fabric configuration policyinformation, the system can use a quorum-based policy in the policydaemon, which specifies a minimum number of the subnet managers neededin the middleware machine environment in order to support a policyupdate. For example, a quorum-based policy can require more than half ofall the standby subnet managers must be in synchronization. If less thanhalf of all the standby subnet managers have the same policy, then aquorum cannot be reached, and no fabric policy changes can beimplemented until either a quorum has been reached, or until a systemadministrator redefines the master candidate set. For example, a“split-brain” condition can be detected in a middleware machineenvironment with only two subnet managers. When a single point offailure disables one subnet manager, then there is only one subnetmanager existing in the system and the quorum-based policy can preventthe subnet manager which detects the “split-brain” condition from takingon any master role.

In accordance with an embodiment of the invention, a policy updatetransaction 509 is committed only when a quorum of different policydaemons all agree on the policy update. Additionally, the policy daemon513 can ensure that the current policy is in synch within a quorum ofpolicy daemons before allowing a newly elected master subnet manager 523to complete initialization of the subnet.

In accordance with an embodiment of the invention, a quorum based schemehas the advantage of being able to both implement and change a policyfollowing one or more failures as long as a sufficient level ofredundancy is provided, i.e. the system is configured with sufficientnumber of independent master subnet manager candidate instances.

For example, a decision about implementing, or changing andimplementing, a policy can be based on an assumption that there are noconflicting policy decisions made among other potential mastercandidates. If only exactly half of the configured standby subnetmanagers are available (e.g. 2 out of a total of 4 subnet managers or 1out of a total of 2 subnet managers), then no decision to implement orchange the policy is permitted. Thus, a population of 3 or 4 mastercandidates can survive a single point of failure and still be able toestablish a quorum that can make decisions about implementing orchanging the policy. Furthermore, a population of 5 or 6 mastercandidates can tolerate 2 failures, 7 and 8 master candidates cantolerate 3 failures, and so on.

In accordance with an embodiment of the invention, a consensus basedscheme can be used in the system, when it is impossible to establish aquorum (or majority) following a single point of failure, for examplefor a configuration with only two master subnet manager candidates. Theconsensus based rules can implement the current policy when at least onesingle master subnet manager can be established. However, in order topreserve consistency in the system, the current policy may be changedwhen any master subnet manager candidate is not a part of the upgradetransaction.

The advantage of the consensus based scheme is that any subnet managerthat becomes the master can immediately configure the subnet based onthe current policy as long as the local policy daemon can determine thatthe policy state reflects a committed update transaction. The drawbackis that a single point of failure that makes a single subnet managermaster candidate unavailable can prevent any further policy updatetransactions.

Implementation without Third Party Constraints

In accordance with an embodiment of the invention, an implementation ofthe system as described above requires only a minimal change in anexisting subnet manager implementation, and allows the subnet managerimplementation to be based on third party source code, such as opensource shared code. The system allows the handling of stateful fail-overto be implemented without the open source constraints and also can beindependent of the IB fabric that the subnet manager relates to.

FIG. 6 illustrates an exemplary flow chart for implementing a systemthat supports stateful subnet manager failover in a middleware machineenvironment, in accordance with an embodiment of the invention. As shownin FIG. 6, at step 601, the subnet manager employs a core logicimplementation. The core logic of the subnet manager can implement an IBstandard such as the OpenSM standard, which only supports statelessfailover. Furthermore, at step 202, a policy daemon can be associatedwith the subnet manager in order to inject critical policy informationfor the middleware machine environment into the core logicimplementation in the subnet manager. The core logic implementation inthe subnet manager may not be aware of the policy daemon. Theimplementation of the policy daemon requires only minimal change to thecore logic implementation, and can be independent of and separated fromthe core logic implementation in the subnet manager. Additionally, atransactional interface can be associated with the policy daemon, atstep 203. The transactional interface allows transactional behavior forpolicy updates in the middleware machine environment. The transactionalbehavior can ensure full ACID (atomicity, consistency, isolation,durability) properties for the policy updates without a need ofincluding the transaction logic within the core logic implementation inthe subnet manager.

The present invention may be conveniently implemented using one or moreconventional general purpose or specialized digital computer, computingdevice, machine, or microprocessor, including one or more processors,memory and/or computer readable storage media programmed according tothe teachings of the present disclosure. Appropriate software coding canreadily be prepared by skilled programmers based on the teachings of thepresent disclosure, as will be apparent to those skilled in the softwareart.

In some embodiments, the present invention includes a computer programproduct which is a storage medium or computer readable medium (media)having instructions stored thereon/in which can be used to program acomputer to perform any of the processes of the present invention. Thestorage medium can include, but is not limited to, any type of diskincluding floppy disks, optical discs, DVD, CD-ROMs, microdrive, andmagneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs, VRAMs, flashmemory devices, magnetic or optical cards, nanosystems (includingmolecular memory ICs), or any type of media or device suitable forstoring instructions and/or data.

The foregoing description of the present invention has been provided forthe purposes of illustration and description. It is not intended to beexhaustive or to limit the invention to the precise forms disclosed.Many modifications and variations will be apparent to the practitionerskilled in the art. The embodiments were chosen and described in orderto best explain the principles of the invention and its practicalapplication, thereby enabling others skilled in the art to understandthe invention for various embodiments and with various modificationsthat are suited to the particular use contemplated. It is intended thatthe scope of the invention be defined by the following claims and theirequivalence.

1. A system for supporting policy transaction in a middleware machineenvironment, comprising: one or more microprocessors; a policy daemon,running on the one or more microprocessors, associated with a mastersubnet manager in a subnet in the middleware machine environment,wherein the policy daemon manages one or more policies for the subnet; atransactional interface associated with the policy daemon, wherein thetransactional interface allows for updating the one or more policiesmanaged by the policy daemon associated with the master subnet managerusing a policy update transaction; and wherein the master subnet manageris associated with one or more subnet managers that are mastercandidates in the subnet, and the policy daemon associated with themaster subnet manager operates to replicate the policy updatetransaction to the one or more subnet managers before committing thepolicy update transaction.
 2. The system according to claim 1, wherein:the subnet is Infiniband (IB) subnet that includes a plurality ofmanagement nodes connecting with a plurality of host servers.
 3. Thesystem according to claim 2, wherein: the plurality of management nodesinclude one or more network switches, wherein each said subnet managerresides on a network switch.
 4. The system according to claim 1,wherein: each said subnet manager is associate with a different policydaemon.
 5. The system according to claim 4, wherein: the policy updatetransaction is committed only when a quorum of said different policydaemons agrees.
 6. The system according to claim 1, wherein: when themaster subnet manager fails, the one or more subnet managers operate tonegotiate with each other and elect a new master subnet manager, whichis responsible for configuring and managing the middleware machineenvironment.
 7. The system according to claim 1, wherein: the subnetuses an in-band communication protocol to connect the master subnetmanager with the one or more subnet managers.
 8. The system according toclaim 1, wherein: a said policy is a partition policy that can define apartition configuration in the subnet, and wherein the partition policycan be supplied to the subnet through an initialization policytransaction.
 9. The system according to claim 1, further comprising: acommand interface that is responsible for providing policies to themaster subnet manager via the transactional interface.
 10. The systemaccording to claim 1, wherein: the master subnet manager can use adefault patitioning policy for initialization when no partitioningpolicy is specified.
 11. The system according to claim 1, wherein: themaster subnet manager ensures that functioning of the middleware machineenvironment is not be interrupted when a standby subnet manager takesover and becomes a new master subnet manager.
 12. The system accordingto claim 1, wherein: all stale policy information can be removed beforeapplying the new policy or the policy updates.
 13. The system accordingto claim 1, wherein: the policy update transaction can include either anew policy or a set of policy updates, and the policy update transactioncan be represented using a unique version number.
 14. The systemaccording to claim 14, wherein: the master subnet manager considers apolicy associated with a highest version number as the current policyused in the middleware machine environment, and the subnet allows onepolicy update transaction in progress at any point of time in thesubnet.
 15. The system according to claim 1, wherein: the policy daemonensures that current policy is in synch within a quorum of policydaemons before allowing a newly elected master subnet manager tocomplete initialization of the subnet.
 16. The system according to claim1, wherein: a consensus based scheme is used when it is impossible toestablish a quorum following a single point of failure, wherein theconsensus based rules can implement a current policy when at least onesingle master subnet manager is established and the current policy cannot be changed when any master subnet manager candidates is not a partof the upgrade transaction.
 17. The system according to claim 1,wherein: the subnet manager is implemented with a core logic based onthird party source code.
 18. The system according to claim 17, wherein:the policy daemon can inject critical policy information for themiddleware machine environment into the core logic implementation in thesubnet manager.
 19. A method for supporting policy transaction in amiddleware machine environment, comprising: associating a policy daemonrunning on one or more microprocessors with a master subnet manager in asubnet in the middleware machine environment, wherein the policy daemonmanages one or more policies for the subnet; associating a transactionalinterface with the policy daemon, wherein the transactional interfaceallows for updating the one or more policies managed by the policydaemon associated with the master subnet manager using a policy updatetransaction; and replicating, via the policy daemon associated with themaster subnet manager, the policy update transaction to one or moresubnet managers that are master candidates associated with the mastersubnet manager before committing the policy update transaction.
 20. Amachine readable medium having instructions stored thereon that whenexecuted cause a system to perform the steps of: associating a policydaemon running on one or more microprocessors with a master subnetmanager in a subnet in the middleware machine environment, wherein thepolicy daemon manages one or more policies for the subnet; associating atransactional interface with the policy daemon, wherein thetransactional interface allows for updating the one or more policiesmanaged by the policy daemon associated with the master subnet managerusing a policy update transaction; and replicating, via the policydaemon associated with the master subnet manager, the policy updatetransaction to one or more subnet managers that are master candidatesassociated with the master subnet manager before committing the policyupdate transaction.